Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.
KMID : 1155220220470040279
Journal of the Korean Society of Health Information and Health Statistics
2022 Volume.47 No. 4 p.279 ~ p.289
A Study on the Availability of Survival Analysis of Lung Cancer Patients Using Synthetic Data
Yu Je-Hyeong

Lee Seung-Hee
Lee Seung-Hee
Kim Jong-Yeup
Son Ji-Woong
Ku Gwan-Woo
Lee Sue-Hyun
Abstract
Objectives: This was a pilot study to investigate the possibility of clinical analysis to support the lack of sample size of real data and to generate synthetic data. Since real data has many limitations, such as ethical issues and costly issues, there have been many attempts to create realistic synthetic data. The focus is on whether synthetic data can be used instead of real data.

Methods: This study analyzed 11,978 lung cancer patients who used anticancer drug therapy using synthetic data as a quasi-experimental study. Clinically significant variables were extracted and some tables containing patient status and treatment records were preprocessed. This experiment was applied to the propensity score matching technique to prevent the bias of covariates. Then, the preprocessed data were analyzed using Kaplan-Meier estimation and Cox proportional hazards model.

Results: When plotting the survival curves, the curves from the synthetic data did not match the curves for the actual data of the other covariates. In Cohort 1, Gen I had a better 5-year OS than Gen II [S1 = 0.973, S2 = 0.953, p < 0.05]. Similarly, Gen I anti-cancer was better than Gen III in Cohort 2 [S1 = 0.990, S3 = 0.884, p < 0.05]. In the exploratory sub- group analysis using the Cox regression model, the risk ratio was estimated. We found that Gen I had a better effect on HR than Gen II and III. However, those results were different from the actual trend.

Conclusions: It was found that the analysis based on the DATA-FREE-BOX data was different from the trend of the survival analysis conducted with the real data. The trend of this analysis could be different from the real trend. It will be able to contribute to data-validation. Moreover, it is expected that the same methodology can be applied in clinical studies based on actual data by utilizing the technique used in this study.
KEYWORD
Survival analysis, Synthetic data, Kaplan-Meier estimation, Cox regression model, Lung cancer
FullTexts / Linksout information
Listed journal information
ÇмúÁøÈïÀç´Ü(KCI)